Tutorial 4 shows one way to plan the design of a future experiment
using EHA.
4. Build an initial model
In general, it can sometimes be a lot simpler to use an index coding
approach to build regression models.
see this link for some background on an index coding approach: https://bookdown.org/content/4857/the-many-variables-the-spurious-waffles.html#many-categories.
and then maybe specify the interaction term like this… https://discourse.mc-stan.org/t/specifying-formulates-for-interaction-between-categorical-variables-with-the-index-coding-approach-in-brms/29449/3
And in the context of data simulation, index coding can be
particularly useful, because it may be often easier and more intuitive
to set condition means, rather than more complicated regression
parameters, such as interaction terms.
The approach here is to build an initial model using an index coding
approach, but with a subset of the data that was used in prior
tutorials.
Because we are building Bayesian regression models, which generate a
posterior distribution per condition of interest in our index coding
approach, we can simply calculate contrasts of interests and associated
precision or interval estimates (95% quantile intervals, for example) to
address any questions we may have about comparisons between conditions,
such as incongruent vs congruent in a particular time bin.
check the priors available
get_prior(formula,
data = data, family = bernoulli(link = "cloglog"))
## prior class coef group resp dpar nlpar lb ub
## (flat) b
## (flat) b timebin4:primecon
## (flat) b timebin4:primeinc
## (flat) b timebin5:primecon
## (flat) b timebin5:primeinc
## (flat) b timebin6:primecon
## (flat) b timebin6:primeinc
## (flat) b timebin7:primecon
## (flat) b timebin7:primeinc
## (flat) b timebin8:primecon
## (flat) b timebin8:primeinc
## (flat) b timebin9:primecon
## (flat) b timebin9:primeinc
## lkj(1) cor
## lkj(1) cor pid
## student_t(3, 0, 2.5) sd 0
## student_t(3, 0, 2.5) sd pid 0
## student_t(3, 0, 2.5) sd timebin4:primecon pid 0
## student_t(3, 0, 2.5) sd timebin4:primeinc pid 0
## student_t(3, 0, 2.5) sd timebin5:primecon pid 0
## student_t(3, 0, 2.5) sd timebin5:primeinc pid 0
## student_t(3, 0, 2.5) sd timebin6:primecon pid 0
## student_t(3, 0, 2.5) sd timebin6:primeinc pid 0
## student_t(3, 0, 2.5) sd timebin7:primecon pid 0
## student_t(3, 0, 2.5) sd timebin7:primeinc pid 0
## student_t(3, 0, 2.5) sd timebin8:primecon pid 0
## student_t(3, 0, 2.5) sd timebin8:primeinc pid 0
## student_t(3, 0, 2.5) sd timebin9:primecon pid 0
## student_t(3, 0, 2.5) sd timebin9:primeinc pid 0
## source
## default
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## default
## (vectorized)
## default
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
visualise priors
Using an index coding approach, each condition has a prior for the
mean of that condition. And as we know from plot 3.1 above, the
condition averages across the six time bins selected range from near
zero on the hazard probability scale to 0.50. And in principle, the more
general case could involve any hazard value from zero to 1. Therefore,
if we try to stick to a weakly informative approach to priors see
here for more details, then priors for the cloglog values should
cover a fairly broad range.
For a look at values across the cloglog scale and how it compares to
the logit scale, see here.
Ok, so let’s take a look at some cloglog values at the lower and
upper end of the probability scale.
cloglog_vals = clogloglink(c(0.01,0.99))
cloglog_vals
## [1] -4.600149 1.527180
# -4.600149 1.527180
and now take a look at some distributions
visualize("normal(0, 2)", "normal(0, 1)", "normal(0, 0.5)",
xlim = c(-4, 4))

(0,2) looks like it covers all of our likely values, even if it is
too broad.
For simplicity, we use these normal(0,2) priors for the mean per
condition. However, if we were doing this for real, we would probably
choose different priors that are similar to ones that were discussed in
Tutorial_2a, as well as in Supplementary Materials.
set priors
We set the prior for the mean as (normal(0,2)) and then we use some
default recommendations from McElreath 2020 for the rest.
priors <- c(
set_prior("normal(0, 2)", class = "b"),
set_prior("normal(0, 1)", class = "sd"),
set_prior("lkj(2)", class = "cor")
)
run the model
This model takes ~17 minutes to build on a 2020 MacBook Pro (2 GHz
Quad-Core Intel Core i5). For this tutorial, this chunk is skipped to
save time. Change eval=TRUE in the code chunk to run this model.
plan(multicore)
bi <- brm(formula = formula,
data = data, family = bernoulli(link = "cloglog"),
prior = priors,
iter = 2000, warmup = 1000, cores = 8, chains = 4,
control = list(adapt_delta = 0.95),
save_pars = save_pars(all=TRUE),
seed = 123,
init = 0.01,
file = "Tutorial_4_planning/models/bi")
summary(bi)
The model built without any concerns, errors or warnings.
At this point, you would normally do the following:
check the model by looking at the convergence of chains and other
model diagnostics. We’ll skip this step for now, just to move on to the
main focus of planning and data simulation.
Summarise and visualise the parameter estimates and/or posterior
predictions.
Below, we quickly summarise fixed effects from the model in cloglog
values, and we also convert them to hazard values, which tend to be far
easier to interpret.
visualise fixed effect parameters from model bi
read in the existing model object (if you did not build the model
yourself)
bi <- readRDS("Tutorial_4_planning/models/bi.rds")
summary(bi)
## Family: bernoulli
## Links: mu = cloglog
## Formula: outcome ~ 0 + timebin:prime + (0 + timebin:prime | pid)
## Data: data (Number of observations: 7385)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Multilevel Hyperparameters:
## ~pid (Number of levels: 6)
## Estimate Est.Error l-95% CI u-95% CI
## sd(timebin4:primecon) 1.06 0.53 0.15 2.24
## sd(timebin5:primecon) 1.34 0.44 0.66 2.35
## sd(timebin6:primecon) 0.60 0.26 0.25 1.26
## sd(timebin7:primecon) 0.47 0.21 0.19 0.98
## sd(timebin8:primecon) 0.58 0.25 0.24 1.24
## sd(timebin9:primecon) 0.72 0.28 0.34 1.41
## sd(timebin4:primeinc) 1.17 0.52 0.31 2.37
## sd(timebin5:primeinc) 1.05 0.41 0.46 2.06
## sd(timebin6:primeinc) 0.91 0.32 0.45 1.67
## sd(timebin7:primeinc) 1.47 0.43 0.77 2.44
## sd(timebin8:primeinc) 1.34 0.43 0.67 2.33
## sd(timebin9:primeinc) 0.62 0.32 0.12 1.38
## cor(timebin4:primecon,timebin5:primecon) 0.18 0.24 -0.30 0.60
## cor(timebin4:primecon,timebin6:primecon) 0.14 0.25 -0.35 0.59
## cor(timebin5:primecon,timebin6:primecon) 0.19 0.23 -0.27 0.63
## cor(timebin4:primecon,timebin7:primecon) 0.15 0.24 -0.33 0.60
## cor(timebin5:primecon,timebin7:primecon) 0.15 0.24 -0.32 0.60
## cor(timebin6:primecon,timebin7:primecon) 0.16 0.24 -0.32 0.61
## cor(timebin4:primecon,timebin8:primecon) 0.06 0.25 -0.41 0.53
## cor(timebin5:primecon,timebin8:primecon) 0.00 0.23 -0.44 0.45
## cor(timebin6:primecon,timebin8:primecon) 0.03 0.24 -0.44 0.49
## cor(timebin7:primecon,timebin8:primecon) 0.08 0.24 -0.40 0.54
## cor(timebin4:primecon,timebin9:primecon) 0.08 0.25 -0.41 0.55
## cor(timebin5:primecon,timebin9:primecon) 0.09 0.23 -0.38 0.52
## cor(timebin6:primecon,timebin9:primecon) 0.05 0.24 -0.43 0.51
## cor(timebin7:primecon,timebin9:primecon) 0.05 0.24 -0.41 0.52
## cor(timebin8:primecon,timebin9:primecon) 0.18 0.24 -0.31 0.62
## cor(timebin4:primecon,timebin4:primeinc) 0.16 0.25 -0.34 0.61
## cor(timebin5:primecon,timebin4:primeinc) 0.22 0.23 -0.24 0.65
## cor(timebin6:primecon,timebin4:primeinc) 0.17 0.24 -0.32 0.61
## cor(timebin7:primecon,timebin4:primeinc) 0.15 0.24 -0.31 0.60
## cor(timebin8:primecon,timebin4:primeinc) 0.04 0.24 -0.42 0.49
## cor(timebin9:primecon,timebin4:primeinc) 0.08 0.24 -0.39 0.54
## cor(timebin4:primecon,timebin5:primeinc) 0.15 0.24 -0.33 0.60
## cor(timebin5:primecon,timebin5:primeinc) 0.22 0.23 -0.25 0.65
## cor(timebin6:primecon,timebin5:primeinc) 0.16 0.24 -0.31 0.61
## cor(timebin7:primecon,timebin5:primeinc) 0.16 0.25 -0.32 0.61
## cor(timebin8:primecon,timebin5:primeinc) 0.04 0.24 -0.43 0.48
## cor(timebin9:primecon,timebin5:primeinc) 0.04 0.24 -0.44 0.50
## cor(timebin4:primeinc,timebin5:primeinc) 0.17 0.24 -0.31 0.62
## cor(timebin4:primecon,timebin6:primeinc) 0.13 0.24 -0.34 0.58
## cor(timebin5:primecon,timebin6:primeinc) 0.15 0.23 -0.31 0.57
## cor(timebin6:primecon,timebin6:primeinc) 0.19 0.24 -0.29 0.64
## cor(timebin7:primecon,timebin6:primeinc) 0.18 0.24 -0.30 0.62
## cor(timebin8:primecon,timebin6:primeinc) 0.07 0.23 -0.39 0.52
## cor(timebin9:primecon,timebin6:primeinc) 0.01 0.23 -0.43 0.47
## cor(timebin4:primeinc,timebin6:primeinc) 0.15 0.23 -0.31 0.58
## cor(timebin5:primeinc,timebin6:primeinc) 0.16 0.23 -0.30 0.60
## cor(timebin4:primecon,timebin7:primeinc) 0.09 0.24 -0.39 0.53
## cor(timebin5:primecon,timebin7:primeinc) 0.08 0.22 -0.35 0.50
## cor(timebin6:primecon,timebin7:primeinc) 0.14 0.23 -0.30 0.58
## cor(timebin7:primecon,timebin7:primeinc) 0.14 0.23 -0.31 0.57
## cor(timebin8:primecon,timebin7:primeinc) 0.20 0.23 -0.25 0.62
## cor(timebin9:primecon,timebin7:primeinc) 0.11 0.23 -0.35 0.55
## cor(timebin4:primeinc,timebin7:primeinc) 0.10 0.24 -0.37 0.53
## cor(timebin5:primeinc,timebin7:primeinc) 0.11 0.23 -0.33 0.55
## cor(timebin6:primeinc,timebin7:primeinc) 0.21 0.23 -0.25 0.63
## cor(timebin4:primecon,timebin8:primeinc) 0.07 0.24 -0.40 0.54
## cor(timebin5:primecon,timebin8:primeinc) 0.03 0.22 -0.40 0.46
## cor(timebin6:primecon,timebin8:primeinc) 0.09 0.23 -0.36 0.53
## cor(timebin7:primecon,timebin8:primeinc) 0.14 0.24 -0.32 0.57
## cor(timebin8:primecon,timebin8:primeinc) 0.17 0.23 -0.31 0.59
## cor(timebin9:primecon,timebin8:primeinc) 0.01 0.23 -0.43 0.47
## cor(timebin4:primeinc,timebin8:primeinc) 0.05 0.24 -0.41 0.50
## cor(timebin5:primeinc,timebin8:primeinc) 0.11 0.23 -0.33 0.54
## cor(timebin6:primeinc,timebin8:primeinc) 0.19 0.23 -0.27 0.61
## cor(timebin7:primeinc,timebin8:primeinc) 0.25 0.22 -0.19 0.65
## cor(timebin4:primecon,timebin9:primeinc) 0.06 0.25 -0.44 0.54
## cor(timebin5:primecon,timebin9:primeinc) 0.04 0.24 -0.42 0.48
## cor(timebin6:primecon,timebin9:primeinc) 0.01 0.24 -0.46 0.50
## cor(timebin7:primecon,timebin9:primeinc) 0.04 0.24 -0.44 0.50
## cor(timebin8:primecon,timebin9:primeinc) 0.09 0.25 -0.40 0.54
## cor(timebin9:primecon,timebin9:primeinc) 0.15 0.25 -0.35 0.61
## cor(timebin4:primeinc,timebin9:primeinc) 0.05 0.25 -0.46 0.51
## cor(timebin5:primeinc,timebin9:primeinc) -0.01 0.24 -0.47 0.47
## cor(timebin6:primeinc,timebin9:primeinc) -0.01 0.24 -0.48 0.45
## cor(timebin7:primeinc,timebin9:primeinc) -0.01 0.24 -0.47 0.44
## cor(timebin8:primeinc,timebin9:primeinc) -0.04 0.24 -0.50 0.43
## Rhat Bulk_ESS Tail_ESS
## sd(timebin4:primecon) 1.00 1575 946
## sd(timebin5:primecon) 1.00 2481 2988
## sd(timebin6:primecon) 1.00 2305 3263
## sd(timebin7:primecon) 1.00 2073 2563
## sd(timebin8:primecon) 1.00 2334 2939
## sd(timebin9:primecon) 1.00 2575 3003
## sd(timebin4:primeinc) 1.00 2200 1735
## sd(timebin5:primeinc) 1.00 3162 2990
## sd(timebin6:primeinc) 1.00 2699 3051
## sd(timebin7:primeinc) 1.00 2835 2542
## sd(timebin8:primeinc) 1.00 3978 3135
## sd(timebin9:primeinc) 1.00 1920 1493
## cor(timebin4:primecon,timebin5:primecon) 1.00 2579 2755
## cor(timebin4:primecon,timebin6:primecon) 1.00 3235 2896
## cor(timebin5:primecon,timebin6:primecon) 1.00 3775 3240
## cor(timebin4:primecon,timebin7:primecon) 1.00 3403 2959
## cor(timebin5:primecon,timebin7:primecon) 1.00 3891 3211
## cor(timebin6:primecon,timebin7:primecon) 1.00 3464 3294
## cor(timebin4:primecon,timebin8:primecon) 1.00 2597 2992
## cor(timebin5:primecon,timebin8:primecon) 1.00 3771 3027
## cor(timebin6:primecon,timebin8:primecon) 1.00 3425 3031
## cor(timebin7:primecon,timebin8:primecon) 1.00 3417 3089
## cor(timebin4:primecon,timebin9:primecon) 1.00 3508 3062
## cor(timebin5:primecon,timebin9:primecon) 1.00 3333 3269
## cor(timebin6:primecon,timebin9:primecon) 1.00 3878 3057
## cor(timebin7:primecon,timebin9:primecon) 1.00 3440 3278
## cor(timebin8:primecon,timebin9:primecon) 1.00 3203 2802
## cor(timebin4:primecon,timebin4:primeinc) 1.00 3562 2973
## cor(timebin5:primecon,timebin4:primeinc) 1.00 3666 2937
## cor(timebin6:primecon,timebin4:primeinc) 1.00 4437 3223
## cor(timebin7:primecon,timebin4:primeinc) 1.00 3549 3408
## cor(timebin8:primecon,timebin4:primeinc) 1.00 4029 3557
## cor(timebin9:primecon,timebin4:primeinc) 1.00 4154 3375
## cor(timebin4:primecon,timebin5:primeinc) 1.00 3651 3170
## cor(timebin5:primecon,timebin5:primeinc) 1.00 5235 3550
## cor(timebin6:primecon,timebin5:primeinc) 1.00 4083 3353
## cor(timebin7:primecon,timebin5:primeinc) 1.00 3654 3361
## cor(timebin8:primecon,timebin5:primeinc) 1.00 3569 3243
## cor(timebin9:primecon,timebin5:primeinc) 1.00 4171 3303
## cor(timebin4:primeinc,timebin5:primeinc) 1.00 3555 3378
## cor(timebin4:primecon,timebin6:primeinc) 1.00 3405 3068
## cor(timebin5:primecon,timebin6:primeinc) 1.00 4233 3473
## cor(timebin6:primecon,timebin6:primeinc) 1.00 3645 3073
## cor(timebin7:primecon,timebin6:primeinc) 1.00 3966 3441
## cor(timebin8:primecon,timebin6:primeinc) 1.00 3942 3390
## cor(timebin9:primecon,timebin6:primeinc) 1.00 3490 3451
## cor(timebin4:primeinc,timebin6:primeinc) 1.00 3394 3450
## cor(timebin5:primeinc,timebin6:primeinc) 1.00 3364 3127
## cor(timebin4:primecon,timebin7:primeinc) 1.00 2789 2925
## cor(timebin5:primecon,timebin7:primeinc) 1.00 3968 3263
## cor(timebin6:primecon,timebin7:primeinc) 1.00 3561 3246
## cor(timebin7:primecon,timebin7:primeinc) 1.00 3904 3129
## cor(timebin8:primecon,timebin7:primeinc) 1.00 4104 3483
## cor(timebin9:primecon,timebin7:primeinc) 1.00 3650 3046
## cor(timebin4:primeinc,timebin7:primeinc) 1.00 3065 3355
## cor(timebin5:primeinc,timebin7:primeinc) 1.00 3444 3085
## cor(timebin6:primeinc,timebin7:primeinc) 1.00 3049 3637
## cor(timebin4:primecon,timebin8:primeinc) 1.00 3448 3249
## cor(timebin5:primecon,timebin8:primeinc) 1.00 4224 2985
## cor(timebin6:primecon,timebin8:primeinc) 1.00 3937 3479
## cor(timebin7:primecon,timebin8:primeinc) 1.00 3616 3515
## cor(timebin8:primecon,timebin8:primeinc) 1.00 3591 3346
## cor(timebin9:primecon,timebin8:primeinc) 1.00 4010 3616
## cor(timebin4:primeinc,timebin8:primeinc) 1.00 2563 2990
## cor(timebin5:primeinc,timebin8:primeinc) 1.00 3669 3445
## cor(timebin6:primeinc,timebin8:primeinc) 1.00 3298 3284
## cor(timebin7:primeinc,timebin8:primeinc) 1.00 2613 3550
## cor(timebin4:primecon,timebin9:primeinc) 1.00 4337 3141
## cor(timebin5:primecon,timebin9:primeinc) 1.00 4579 3537
## cor(timebin6:primecon,timebin9:primeinc) 1.00 4460 3489
## cor(timebin7:primecon,timebin9:primeinc) 1.00 3474 3310
## cor(timebin8:primecon,timebin9:primeinc) 1.00 3563 3130
## cor(timebin9:primecon,timebin9:primeinc) 1.00 4193 3345
## cor(timebin4:primeinc,timebin9:primeinc) 1.00 3087 3390
## cor(timebin5:primeinc,timebin9:primeinc) 1.00 3446 3189
## cor(timebin6:primeinc,timebin9:primeinc) 1.00 3237 3300
## cor(timebin7:primeinc,timebin9:primeinc) 1.00 3336 3373
## cor(timebin8:primeinc,timebin9:primeinc) 1.00 3066 3561
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## timebin4:primecon -4.66 0.63 -5.96 -3.44 1.00 2425 2603
## timebin5:primecon -3.24 0.61 -4.43 -2.01 1.00 1680 1803
## timebin6:primecon -2.32 0.30 -2.93 -1.72 1.00 2041 2286
## timebin7:primecon -1.49 0.23 -1.96 -1.02 1.00 2030 1901
## timebin8:primecon -1.21 0.28 -1.77 -0.63 1.00 1783 1971
## timebin9:primecon -0.60 0.32 -1.23 0.07 1.00 1931 2216
## timebin4:primeinc -4.37 0.61 -5.57 -3.13 1.00 2868 2386
## timebin5:primeinc -3.21 0.50 -4.18 -2.19 1.00 2349 2460
## timebin6:primeinc -2.37 0.41 -3.12 -1.48 1.00 2078 2205
## timebin7:primeinc -2.27 0.64 -3.56 -0.99 1.00 2088 2403
## timebin8:primeinc -3.04 0.59 -4.14 -1.80 1.00 2662 2729
## timebin9:primeinc -2.45 0.32 -3.11 -1.79 1.00 3116 2732
##
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
wrangle the posterior distribution
## take the posterior draws
post <- as_draws_df(bi) %>%
select(-lp__) %>%
as_tibble()
## create a summary, and use medians with robust = TRUE
post_summary <- posterior_summary(bi, robust = TRUE)
## just look at the fixed effects
post_qi_b <- post %>%
select(starts_with("b_")) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
median_qi(value)
head(post_qi_b)
## # A tibble: 6 × 7
## name value .lower .upper .width .point .interval
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 b_timebin4:primecon -4.63 -5.96 -3.44 0.95 median qi
## 2 b_timebin4:primeinc -4.35 -5.57 -3.13 0.95 median qi
## 3 b_timebin5:primecon -3.24 -4.43 -2.01 0.95 median qi
## 4 b_timebin5:primeinc -3.21 -4.18 -2.19 0.95 median qi
## 5 b_timebin6:primecon -2.32 -2.93 -1.72 0.95 median qi
## 6 b_timebin6:primeinc -2.38 -3.12 -1.48 0.95 median qi
a quick plot using posterior samples and tidybayes
tidy_fixed <- post %>%
select(starts_with("b_"), .chain, .iteration, .draw) %>% # select and rename in simpler labels.
rename(chain=.chain, iter=.iteration, draw=.draw) %>%
pivot_longer(-c(chain, draw, iter)) %>% # move from wide to long
mutate(key = factor(name, levels=unique(name)),
bin = rep(1:6, length.out = 48000),
prime = rep(c("con", "inc"), each = 6, length.out = 48000),
bin=factor(bin),
prime=factor(prime))
head(tidy_fixed)
## # A tibble: 6 × 8
## chain iter draw name value key bin prime
## <int> <int> <int> <chr> <dbl> <fct> <fct> <fct>
## 1 1 1 1 b_timebin4:primecon -6.09 b_timebin4:primecon 1 con
## 2 1 1 1 b_timebin5:primecon -2.32 b_timebin5:primecon 2 con
## 3 1 1 1 b_timebin6:primecon -2.78 b_timebin6:primecon 3 con
## 4 1 1 1 b_timebin7:primecon -1.40 b_timebin7:primecon 4 con
## 5 1 1 1 b_timebin8:primecon -1.32 b_timebin8:primecon 5 con
## 6 1 1 1 b_timebin9:primecon -0.472 b_timebin9:primecon 6 con
tail(tidy_fixed)
## # A tibble: 6 × 8
## chain iter draw name value key bin prime
## <int> <int> <int> <chr> <dbl> <fct> <fct> <fct>
## 1 4 1000 4000 b_timebin4:primeinc -4.50 b_timebin4:primeinc 1 inc
## 2 4 1000 4000 b_timebin5:primeinc -3.07 b_timebin5:primeinc 2 inc
## 3 4 1000 4000 b_timebin6:primeinc -1.23 b_timebin6:primeinc 3 inc
## 4 4 1000 4000 b_timebin7:primeinc -2.26 b_timebin7:primeinc 4 inc
## 5 4 1000 4000 b_timebin8:primeinc -3.07 b_timebin8:primeinc 5 inc
## 6 4 1000 4000 b_timebin9:primeinc -2.34 b_timebin9:primeinc 6 inc
check.labels <- tidy_fixed %>%
distinct(key, bin, prime)
check.labels
## # A tibble: 12 × 3
## key bin prime
## <fct> <fct> <fct>
## 1 b_timebin4:primecon 1 con
## 2 b_timebin5:primecon 2 con
## 3 b_timebin6:primecon 3 con
## 4 b_timebin7:primecon 4 con
## 5 b_timebin8:primecon 5 con
## 6 b_timebin9:primecon 6 con
## 7 b_timebin4:primeinc 1 inc
## 8 b_timebin5:primeinc 2 inc
## 9 b_timebin6:primeinc 3 inc
## 10 b_timebin7:primeinc 4 inc
## 11 b_timebin8:primeinc 5 inc
## 12 b_timebin9:primeinc 6 inc
# plot
p4.1 <- ggplot(tidy_fixed, aes(x = bin, y = value, fill=prime)) +
stat_halfeye(alpha=0.7) +
labs(title = "Model bi cloglog",
x = "time bin", y = "cloglog") +
scale_fill_brewer(palette = "Dark2")
p4.1

ggsave ("Tutorial_4_planning/figures/index_cloglog.jpeg",
width = 6, height = 4, dpi = 800)
calculate hazards per condition per bin in the posterior dist
tidy_haz <- tidy_fixed %>%
select(chain, iter, draw, bin, prime, value) %>%
mutate(hazard = exp(value))
head(tidy_haz)
## # A tibble: 6 × 7
## chain iter draw bin prime value hazard
## <int> <int> <int> <fct> <fct> <dbl> <dbl>
## 1 1 1 1 1 con -6.09 0.00227
## 2 1 1 1 2 con -2.32 0.0981
## 3 1 1 1 3 con -2.78 0.0619
## 4 1 1 1 4 con -1.40 0.247
## 5 1 1 1 5 con -1.32 0.266
## 6 1 1 1 6 con -0.472 0.624
plot hazards
p4.2 <- ggplot(tidy_haz, aes(x = bin, y = hazard,
fill=prime)) +
stat_halfeye(alpha=0.7) +
labs(title = "hazard estimates",
x = "time bin", y = "hazard") +
scale_fill_brewer(palette = "Dark2") +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25))
p4.2
## Warning: Removed 166 rows containing missing values or values outside the scale range
## (`stat_slabinterval()`).

ggsave ("Tutorial_4_planning/figures/index_hazard.jpeg",
width = 6, height = 4, dpi = 800)
warning about removing rows.
5. Simulate a single dataset
Note: For multi-level data, data simulation can be quite a convoluted
process, and we therefore strongly recommend reading the power blogs by
Solomon Kurz and working through the faux package vignettes, before
getting stuck into the below code.
read in a prior model, if not already loaded.
bi <- readRDS("Tutorial_4_planning/models/bi.rds")
summary(bi)
Define some parameters
# define parameters
# specify some features of the design
subj_n = 10 # number of subjects
rep_n = 200 # number of trial repeats
cond_n = 2 # number of conditions
bin_n = 6 # number of time bins
## set fixed effects
## as a reminder
fixef <- as_tibble(fixef(bi), rownames = "term") %>%
mutate(across(where(is.double), \(x) round(x, 2)))
fixef
## # A tibble: 12 × 5
## term Estimate Est.Error Q2.5 Q97.5
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 timebin4:primecon -4.66 0.63 -5.96 -3.44
## 2 timebin5:primecon -3.24 0.61 -4.43 -2.01
## 3 timebin6:primecon -2.32 0.3 -2.93 -1.72
## 4 timebin7:primecon -1.49 0.23 -1.96 -1.02
## 5 timebin8:primecon -1.21 0.28 -1.77 -0.63
## 6 timebin9:primecon -0.6 0.32 -1.23 0.07
## 7 timebin4:primeinc -4.37 0.61 -5.57 -3.13
## 8 timebin5:primeinc -3.21 0.5 -4.18 -2.19
## 9 timebin6:primeinc -2.37 0.41 -3.12 -1.48
## 10 timebin7:primeinc -2.27 0.64 -3.56 -0.99
## 11 timebin8:primeinc -3.04 0.59 -4.14 -1.8
## 12 timebin9:primeinc -2.45 0.32 -3.11 -1.79
## b1 to b6 = time bin 1 to 6.
## c = con
b1c = -4.66
b2c = -3.24
b3c = -2.32
b4c = -1.49
b5c = -1.21
b6c = -0.60
## i = inc
b1i = -4.37
b2i = -3.21
b3i = -2.37
b4i = -2.27
b5i = -3.04
b6i = -2.45
## set varying effects by pid
## as a reminder
varcor <- VarCorr(bi)
glimpse(varcor)
## List of 1
## $ pid:List of 3
## ..$ sd : num [1:12, 1:4] 1.062 1.34 0.599 0.468 0.583 ...
## .. ..- attr(*, "dimnames")=List of 2
## ..$ cor: num [1:12, 1:4, 1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
## .. ..- attr(*, "dimnames")=List of 3
## ..$ cov: num [1:12, 1:4, 1:12] 1.4119 0.2562 0.0839 0.069 0.0335 ...
## .. ..- attr(*, "dimnames")=List of 3
str(varcor)
## List of 1
## $ pid:List of 3
## ..$ sd : num [1:12, 1:4] 1.062 1.34 0.599 0.468 0.583 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
## .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
## ..$ cor: num [1:12, 1:4, 1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
## .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
## .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
## ..$ cov: num [1:12, 1:4, 1:12] 1.4119 0.2562 0.0839 0.069 0.0335 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
## .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
## .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
## extract sd
sd <- as_tibble(varcor$pid$sd, rownames = "term") %>%
mutate(across(where(is.double), \(x) round(x, 2)))
sd
## # A tibble: 12 × 5
## term Estimate Est.Error Q2.5 Q97.5
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 timebin4:primecon 1.06 0.53 0.15 2.24
## 2 timebin5:primecon 1.34 0.44 0.66 2.35
## 3 timebin6:primecon 0.6 0.26 0.25 1.26
## 4 timebin7:primecon 0.47 0.21 0.19 0.98
## 5 timebin8:primecon 0.58 0.25 0.24 1.24
## 6 timebin9:primecon 0.72 0.28 0.34 1.41
## 7 timebin4:primeinc 1.17 0.52 0.31 2.37
## 8 timebin5:primeinc 1.05 0.41 0.46 2.06
## 9 timebin6:primeinc 0.91 0.32 0.45 1.67
## 10 timebin7:primeinc 1.47 0.43 0.77 2.44
## 11 timebin8:primeinc 1.34 0.43 0.67 2.33
## 12 timebin9:primeinc 0.62 0.32 0.12 1.38
u1c_sd = 1.06 #
u2c_sd = 1.34 #
u3c_sd = 0.60 #
u4c_sd = 0.47 #
u5c_sd = 0.58 #
u6c_sd = 0.72 #
u1i_sd = 1.17 #
u2i_sd = 1.05 #
u3i_sd = 0.91 #
u4i_sd = 1.47 #
u5i_sd = 1.34 #
u6i_sd = 0.62 #
make a correlation matrix
# varcor <- VarCorr(b3c_out)
# varcor
# str(varcor)
## take each one
sd <- varcor$pid$sd
cor <- varcor$pid$cor
cov <- varcor$pid$cov
## make it tidy
cor_mat_tidy <- as_tibble(cor, rownames = "term") %>% ##
rename_with(~str_replace_all(.x, '\\.', '_')) %>%
pivot_longer(-term,
values_to = "value",
names_to = c("parameter", "term2"),
names_pattern = "([a-zA-Z_\\d?]+)_([a-zA-Z:\\d?]+)") %>% ## pivot longer and use names_pattern to select the parts of the column names to separate on. the above is regex for each bit between the separator ("-"). e.g., ([a-zA-Z]+\\d+). this looks for letters and digits before the first "_". This (\\d?) looks for the possibility of digits.
## the rest of this code just turns the variables into what we want
filter(parameter == "Estimate") %>%
select(-parameter) %>%
pivot_wider(names_from = "term2",
values_from = "value") %>%
select(-term)
head(cor_mat_tidy)
## # A tibble: 6 × 12
## `timebin4:primecon` `timebin5:primecon` `timebin6:primecon`
## <dbl> <dbl> <dbl>
## 1 1 0.181 0.140
## 2 0.181 1 0.194
## 3 0.140 0.194 1
## 4 0.151 0.152 0.160
## 5 0.0557 0.00218 0.0334
## 6 0.0784 0.0865 0.0480
## # ℹ 9 more variables: `timebin7:primecon` <dbl>, `timebin8:primecon` <dbl>,
## # `timebin9:primecon` <dbl>, `timebin4:primeinc` <dbl>,
## # `timebin5:primeinc` <dbl>, `timebin6:primeinc` <dbl>,
## # `timebin7:primeinc` <dbl>, `timebin8:primeinc` <dbl>,
## # `timebin9:primeinc` <dbl>
str(cor_mat_tidy)
## tibble [12 × 12] (S3: tbl_df/tbl/data.frame)
## $ timebin4:primecon: num [1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
## $ timebin5:primecon: num [1:12] 0.18111 1 0.19399 0.15233 0.00218 ...
## $ timebin6:primecon: num [1:12] 0.1402 0.194 1 0.1605 0.0334 ...
## $ timebin7:primecon: num [1:12] 0.1506 0.1523 0.1605 1 0.0786 ...
## $ timebin8:primecon: num [1:12] 0.05571 0.00218 0.03338 0.07859 1 ...
## $ timebin9:primecon: num [1:12] 0.0784 0.0865 0.048 0.0548 0.181 ...
## $ timebin4:primeinc: num [1:12] 0.157 0.2193 0.169 0.1509 0.0387 ...
## $ timebin5:primeinc: num [1:12] 0.1534 0.2218 0.1637 0.1635 0.0405 ...
## $ timebin6:primeinc: num [1:12] 0.1287 0.1457 0.1891 0.1826 0.0658 ...
## $ timebin7:primeinc: num [1:12] 0.0853 0.0849 0.1443 0.143 0.2001 ...
## $ timebin8:primeinc: num [1:12] 0.0739 0.031 0.0924 0.1406 0.1671 ...
## $ timebin9:primeinc: num [1:12] 0.06086 0.03899 0.00569 0.04339 0.08991 ...
## check with the model summary
# summary(bi)
## remove names and make a matrix as faux{} likes it like that
## full cor mat
cor_mat <- unname(as.matrix(cor_mat_tidy))
cor_mat
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00000000 0.181111610 0.140227052 0.15064403 0.055708399 0.07844122
## [2,] 0.18111161 1.000000000 0.193985724 0.15232674 0.002180379 0.08649338
## [3,] 0.14022705 0.193985724 1.000000000 0.16047883 0.033383923 0.04798801
## [4,] 0.15064403 0.152326739 0.160478829 1.00000000 0.078594351 0.05482625
## [5,] 0.05570840 0.002180379 0.033383923 0.07859435 1.000000000 0.18103101
## [6,] 0.07844122 0.086493375 0.047988009 0.05482625 0.181031007 1.00000000
## [7,] 0.15700680 0.219264583 0.168973923 0.15088219 0.038736009 0.08194166
## [8,] 0.15340135 0.221750842 0.163709105 0.16348048 0.040533384 0.03581839
## [9,] 0.12869470 0.145655326 0.189094128 0.18261210 0.065750253 0.01304757
## [10,] 0.08533370 0.084865176 0.144296210 0.14300113 0.200118561 0.11201101
## [11,] 0.07387438 0.030988328 0.092439132 0.14063153 0.167079452 0.01372982
## [12,] 0.06086174 0.038991659 0.005693736 0.04338987 0.089912956 0.15443839
## [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 0.15700680 0.15340135 0.12869470 0.08533370 0.07387438 0.060861737
## [2,] 0.21926458 0.22175084 0.14565533 0.08486518 0.03098833 0.038991659
## [3,] 0.16897392 0.16370910 0.18909413 0.14429621 0.09243913 0.005693736
## [4,] 0.15088219 0.16348048 0.18261210 0.14300113 0.14063153 0.043389867
## [5,] 0.03873601 0.04053338 0.06575025 0.20011856 0.16707945 0.089912956
## [6,] 0.08194166 0.03581839 0.01304757 0.11201101 0.01372982 0.154438395
## [7,] 1.00000000 0.17417672 0.15048835 0.09722597 0.04902133 0.047558257
## [8,] 0.17417672 1.00000000 0.15883356 0.11194773 0.11333808 -0.011738558
## [9,] 0.15048835 0.15883356 1.00000000 0.21397253 0.18646605 -0.014087671
## [10,] 0.09722597 0.11194773 0.21397253 1.00000000 0.25374893 -0.012638958
## [11,] 0.04902133 0.11333808 0.18646605 0.25374893 1.00000000 -0.039673970
## [12,] 0.04755826 -0.01173856 -0.01408767 -0.01263896 -0.03967397 1.000000000
setup the data structure
# make it reproducible
set.seed(1)
d1 <- add_random(subj = subj_n, rep = rep_n) %>%
add_within("subj", condition = c("cond1", "cond2")) %>%
add_contrast("condition", "treatment", add_cols = TRUE,
colnames = c("cond")) %>%
add_within("rep", bin = 1:bin_n) %>%
### create new conditions here that respect the twelve levels b1c, b1i etc.
mutate(con = if_else(condition == "cond1", 1, 0),
inc = if_else(condition == "cond2", 1, 0)) %>%
mutate(bin1 = if_else(bin == 1, 1, 0),
bin2 = if_else(bin == 2, 1, 0),
bin3 = if_else(bin == 3, 1, 0),
bin4 = if_else(bin == 4, 1, 0),
bin5 = if_else(bin == 5, 1, 0),
bin6 = if_else(bin == 6, 1, 0)) %>%
# add random effects
add_ranef("subj", u1c = u1c_sd, u2c = u2c_sd, u3c = u3c_sd, u4c = u4c_sd,
u5c = u5c_sd, u6c = u6c_sd,
u1i = u1i_sd, u2i = u2i_sd, u3i = u3i_sd, u4i = u4i_sd,
u5i = u5i_sd, u6i = u6i_sd,
.cors = cor_mat) %>%
# calculate logit
mutate(cloglog = (b1c + u1c) * con * bin1 + (b2c + u2c) * con * bin2 +
(b3c + u3c) * con * bin3 + (b4c + u4c) * con * bin4 +
(b5c + u5c) * con * bin5 + (b6c + u6c) * con * bin6 +
(b1i + u1i) * inc * bin1 + (b2i + u2i) * inc * bin2 +
(b3i + u3i) * inc * bin3 + (b4i + u4i) * inc * bin4 +
(b5i + u5i) * inc * bin5 + (b6i + u6i) * inc * bin6) %>%
# calculate inverse cloglog
mutate(prob = clogloglink(cloglog, inverse = TRUE)) %>%
# calculate event
mutate(event = rbinom(n(), 1, prob))
head(d1)
## # A tibble: 6 × 28
## subj rep condition cond bin con inc bin1 bin2 bin3 bin4 bin5
## <chr> <chr> <fct> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 subj01 rep001 cond1 0 1 1 0 1 0 0 0 0
## 2 subj01 rep001 cond1 0 2 1 0 0 1 0 0 0
## 3 subj01 rep001 cond1 0 3 1 0 0 0 1 0 0
## 4 subj01 rep001 cond1 0 4 1 0 0 0 0 1 0
## 5 subj01 rep001 cond1 0 5 1 0 0 0 0 0 1
## 6 subj01 rep001 cond1 0 6 1 0 0 0 0 0 0
## # ℹ 16 more variables: bin6 <dbl>, u1c <dbl>, u2c <dbl>, u3c <dbl>, u4c <dbl>,
## # u5c <dbl>, u6c <dbl>, u1i <dbl>, u2i <dbl>, u3i <dbl>, u4i <dbl>,
## # u5i <dbl>, u6i <dbl>, cloglog <dbl>, prob <dbl>, event <int>
str(d1)
## tibble [24,000 × 28] (S3: tbl_df/tbl/data.frame)
## $ subj : chr [1:24000] "subj01" "subj01" "subj01" "subj01" ...
## $ rep : chr [1:24000] "rep001" "rep001" "rep001" "rep001" ...
## $ condition: Factor w/ 2 levels "cond1","cond2": 1 1 1 1 1 1 2 2 2 2 ...
## ..- attr(*, "contrasts")= num [1:2, 1] 0 1
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "cond1" "cond2"
## .. .. ..$ : chr "cond"
## $ cond : num [1:24000] 0 0 0 0 0 0 1 1 1 1 ...
## $ bin : int [1:24000] 1 2 3 4 5 6 1 2 3 4 ...
## $ con : num [1:24000] 1 1 1 1 1 1 0 0 0 0 ...
## $ inc : num [1:24000] 0 0 0 0 0 0 1 1 1 1 ...
## $ bin1 : num [1:24000] 1 0 0 0 0 0 1 0 0 0 ...
## $ bin2 : num [1:24000] 0 1 0 0 0 0 0 1 0 0 ...
## $ bin3 : num [1:24000] 0 0 1 0 0 0 0 0 1 0 ...
## $ bin4 : num [1:24000] 0 0 0 1 0 0 0 0 0 1 ...
## $ bin5 : num [1:24000] 0 0 0 0 1 0 0 0 0 0 ...
## $ bin6 : num [1:24000] 0 0 0 0 0 1 0 0 0 0 ...
## $ u1c : num [1:24000] 0.361 0.361 0.361 0.361 0.361 ...
## $ u2c : num [1:24000] 2.76 2.76 2.76 2.76 2.76 ...
## $ u3c : num [1:24000] 0.0637 0.0637 0.0637 0.0637 0.0637 ...
## $ u4c : num [1:24000] -0.114 -0.114 -0.114 -0.114 -0.114 ...
## $ u5c : num [1:24000] -0.491 -0.491 -0.491 -0.491 -0.491 ...
## $ u6c : num [1:24000] -0.6 -0.6 -0.6 -0.6 -0.6 ...
## $ u1i : num [1:24000] -0.12 -0.12 -0.12 -0.12 -0.12 ...
## $ u2i : num [1:24000] -0.0433 -0.0433 -0.0433 -0.0433 -0.0433 ...
## $ u3i : num [1:24000] 2.13 2.13 2.13 2.13 2.13 ...
## $ u4i : num [1:24000] -0.934 -0.934 -0.934 -0.934 -0.934 ...
## $ u5i : num [1:24000] 0.507 0.507 0.507 0.507 0.507 ...
## $ u6i : num [1:24000] 0.155 0.155 0.155 0.155 0.155 ...
## $ cloglog : num [1:24000] -4.299 -0.479 -2.256 -1.604 -1.701 ...
## $ prob : num [1:24000] 0.0135 0.4619 0.0994 0.1822 0.1668 ...
## $ event : int [1:24000] 0 1 1 0 0 0 0 0 1 0 ...
summary(d1)
## subj rep condition cond
## Length:24000 Length:24000 cond1:12000 Min. :0.0
## Class :character Class :character cond2:12000 1st Qu.:0.0
## Mode :character Mode :character Median :0.5
## Mean :0.5
## 3rd Qu.:1.0
## Max. :1.0
## bin con inc bin1 bin2
## Min. :1.0 Min. :0.0 Min. :0.0 Min. :0.0000 Min. :0.0000
## 1st Qu.:2.0 1st Qu.:0.0 1st Qu.:0.0 1st Qu.:0.0000 1st Qu.:0.0000
## Median :3.5 Median :0.5 Median :0.5 Median :0.0000 Median :0.0000
## Mean :3.5 Mean :0.5 Mean :0.5 Mean :0.1667 Mean :0.1667
## 3rd Qu.:5.0 3rd Qu.:1.0 3rd Qu.:1.0 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :6.0 Max. :1.0 Max. :1.0 Max. :1.0000 Max. :1.0000
## bin3 bin4 bin5 bin6
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1667 Mean :0.1667 Mean :0.1667 Mean :0.1667
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## u1c u2c u3c u4c
## Min. :-1.48460 Min. :-3.4790 Min. :-0.80094 Min. :-0.50119
## 1st Qu.:-0.08369 1st Qu.: 0.0192 1st Qu.:-0.16730 1st Qu.:-0.13584
## Median : 0.27455 Median : 0.1766 Median : 0.09514 Median :-0.05036
## Mean : 0.10650 Mean : 0.2040 Mean : 0.12183 Mean :-0.02763
## 3rd Qu.: 0.58677 3rd Qu.: 1.0012 3rd Qu.: 0.69635 3rd Qu.: 0.04075
## Max. : 0.71113 Max. : 2.7615 Max. : 0.86895 Max. : 0.74960
## u5c u6c u1i u2i
## Min. :-0.6096 Min. :-0.600177 Min. :-2.2361 Min. :-1.56358
## 1st Qu.:-0.2354 1st Qu.:-0.119046 1st Qu.:-0.6386 1st Qu.:-0.68951
## Median : 0.1615 Median :-0.007188 Median :-0.2547 Median :-0.09448
## Mean : 0.1655 Mean : 0.102286 Mean :-0.1315 Mean :-0.16867
## 3rd Qu.: 0.4897 3rd Qu.: 0.295545 3rd Qu.: 0.1644 3rd Qu.: 0.54520
## Max. : 0.9510 Max. : 1.182506 Max. : 1.9723 Max. : 0.66309
## u3i u4i u5i u6i
## Min. :-1.4254 Min. :-1.6222 Min. :-2.83353 Min. :-0.71436
## 1st Qu.:-0.5861 1st Qu.:-0.9336 1st Qu.:-1.18382 1st Qu.:-0.30607
## Median : 0.1007 Median :-0.5244 Median :-0.05776 Median :-0.13340
## Mean : 0.3122 Mean :-0.2627 Mean :-0.40813 Mean :-0.02908
## 3rd Qu.: 1.0056 3rd Qu.: 0.9166 3rd Qu.: 0.50680 3rd Qu.: 0.28242
## Max. : 2.1307 Max. : 1.2520 Max. : 0.96731 Max. : 0.95958
## cloglog prob event
## Min. :-6.7190 Min. :0.001207 Min. :0.0000
## 1st Qu.:-3.3974 1st Qu.:0.032979 1st Qu.:0.0000
## Median :-2.5498 Median :0.075129 Median :0.0000
## Mean :-2.6038 Mean :0.141855 Mean :0.1426
## 3rd Qu.:-1.4892 3rd Qu.:0.201921 3rd Qu.:0.0000
## Max. : 0.5825 Max. :0.833127 Max. :1.0000
glimpse(d1)
## Rows: 24,000
## Columns: 28
## $ subj <chr> "subj01", "subj01", "subj01", "subj01", "subj01", "subj01", …
## $ rep <chr> "rep001", "rep001", "rep001", "rep001", "rep001", "rep001", …
## $ condition <fct> cond1, cond1, cond1, cond1, cond1, cond1, cond2, cond2, cond…
## $ cond <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, …
## $ bin <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, …
## $ con <dbl> 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, …
## $ inc <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, …
## $ bin1 <dbl> 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ bin2 <dbl> 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
## $ bin3 <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, …
## $ bin4 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ bin5 <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ bin6 <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ u1c <dbl> 0.3612864, 0.3612864, 0.3612864, 0.3612864, 0.3612864, 0.361…
## $ u2c <dbl> 2.761482, 2.761482, 2.761482, 2.761482, 2.761482, 2.761482, …
## $ u3c <dbl> 0.06367222, 0.06367222, 0.06367222, 0.06367222, 0.06367222, …
## $ u4c <dbl> -0.1140557, -0.1140557, -0.1140557, -0.1140557, -0.1140557, …
## $ u5c <dbl> -0.4909776, -0.4909776, -0.4909776, -0.4909776, -0.4909776, …
## $ u6c <dbl> -0.6001775, -0.6001775, -0.6001775, -0.6001775, -0.6001775, …
## $ u1i <dbl> -0.1198404, -0.1198404, -0.1198404, -0.1198404, -0.1198404, …
## $ u2i <dbl> -0.04330945, -0.04330945, -0.04330945, -0.04330945, -0.04330…
## $ u3i <dbl> 2.130691, 2.130691, 2.130691, 2.130691, 2.130691, 2.130691, …
## $ u4i <dbl> -0.9336088, -0.9336088, -0.9336088, -0.9336088, -0.9336088, …
## $ u5i <dbl> 0.5067988, 0.5067988, 0.5067988, 0.5067988, 0.5067988, 0.506…
## $ u6i <dbl> 0.1550822, 0.1550822, 0.1550822, 0.1550822, 0.1550822, 0.155…
## $ cloglog <dbl> -4.2987136, -0.4785179, -2.2563278, -1.6040557, -1.7009776, …
## $ prob <dbl> 0.01349415, 0.46189479, 0.09943631, 0.18215247, 0.16681954, …
## $ event <int> 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, …
## save initial data
# write_csv(d1, "Tutorial_4_planning/data/d1.csv") #
summarise the data produced so far
d1_pid <- d1 %>%
group_by(subj, condition, bin) %>%
summarise(
across(cloglog:event, \(x) mean(x, na.rm = TRUE)),
.groups = "drop"
)
d1_pid
## # A tibble: 120 × 6
## subj condition bin cloglog prob event
## <chr> <fct> <int> <dbl> <dbl> <dbl>
## 1 subj01 cond1 1 -4.30 0.0135 0
## 2 subj01 cond1 2 -0.479 0.462 0.49
## 3 subj01 cond1 3 -2.26 0.0994 0.065
## 4 subj01 cond1 4 -1.60 0.182 0.155
## 5 subj01 cond1 5 -1.70 0.167 0.14
## 6 subj01 cond1 6 -1.20 0.260 0.285
## 7 subj01 cond2 1 -4.49 0.0112 0.015
## 8 subj01 cond2 2 -3.25 0.0379 0.035
## 9 subj01 cond2 3 -0.239 0.545 0.525
## 10 subj01 cond2 4 -3.20 0.0398 0.035
## # ℹ 110 more rows
d1_group <- d1 %>%
group_by(condition, bin) %>%
summarise(
across(cloglog:event, \(x) mean(x, na.rm = TRUE)),
.groups = "drop"
)
d1_group
## # A tibble: 12 × 5
## condition bin cloglog prob event
## <fct> <int> <dbl> <dbl> <dbl>
## 1 cond1 1 -4.55 0.0121 0.01
## 2 cond1 2 -3.04 0.0959 0.0995
## 3 cond1 3 -2.20 0.118 0.118
## 4 cond1 4 -1.52 0.204 0.194
## 5 cond1 5 -1.04 0.316 0.306
## 6 cond1 6 -0.498 0.465 0.490
## 7 cond2 1 -4.50 0.0198 0.021
## 8 cond2 2 -3.38 0.0408 0.0435
## 9 cond2 3 -2.06 0.182 0.174
## 10 cond2 4 -2.53 0.110 0.108
## 11 cond2 5 -3.45 0.0499 0.0515
## 12 cond2 6 -2.48 0.0887 0.096
a quick plot
cloglog
p5.1 <- ggplot(d1_group, aes(x=bin, y = cloglog, colour = condition)) +
geom_line(aes(group = condition)) +
geom_point() +
geom_jitter(data=d1_pid, alpha = 0.5, width = 0.1, height = 0) +
# scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
scale_fill_brewer(palette = "Dark2") +
scale_colour_brewer(palette = "Dark2") +
labs(title = "cloglog - simulation d1 (N=10)",
x = "time bins")
p5.1

event prob
p5.2 <- ggplot(d1_group, aes(x=bin, y = event, colour = condition)) +
geom_line(aes(group = condition, linetype=condition)) +
geom_point() +
geom_jitter(data=d1_pid, alpha = 0.5, width = 0.1, height = 0) +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
scale_fill_brewer(palette = "Dark2") +
scale_colour_brewer(palette = "Dark2") +
labs(title = "event prob - simulation d1 (N=10)",
x = "time bins")
p5.2

convert the data into outcome data (this next chunk can be
incorporated into the data structure chunk above, I’m just doing
separately here, so that you can see what is initially generated)
d1_out <- d1 %>%
group_by(subj, rep, condition) %>%
mutate(cumsum = cumsum(event),
outcomel = event==1 & cumsum(event) < 2,
outcomen = if_else(event==1 & cumsum(event) == 1, 1,
if_else(cumsum(event) >= 1, NA, 0))) %>%
drop_na(outcomen) %>%
ungroup()
head(d1_out)
## # A tibble: 6 × 31
## subj rep condition cond bin con inc bin1 bin2 bin3 bin4 bin5
## <chr> <chr> <fct> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 subj01 rep001 cond1 0 1 1 0 1 0 0 0 0
## 2 subj01 rep001 cond1 0 2 1 0 0 1 0 0 0
## 3 subj01 rep001 cond2 1 1 0 1 1 0 0 0 0
## 4 subj01 rep001 cond2 1 2 0 1 0 1 0 0 0
## 5 subj01 rep001 cond2 1 3 0 1 0 0 1 0 0
## 6 subj01 rep002 cond1 0 1 1 0 1 0 0 0 0
## # ℹ 19 more variables: bin6 <dbl>, u1c <dbl>, u2c <dbl>, u3c <dbl>, u4c <dbl>,
## # u5c <dbl>, u6c <dbl>, u1i <dbl>, u2i <dbl>, u3i <dbl>, u4i <dbl>,
## # u5i <dbl>, u6i <dbl>, cloglog <dbl>, prob <dbl>, event <int>, cumsum <int>,
## # outcomel <lgl>, outcomen <dbl>
## save outcome data
# write_csv(d1_out, "Tutorial_4_planning/data/d1_out.csv")
summarise the outcome data, which is produced based on the prob
value, right?
d1_out_pid <- d1_out %>%
group_by(subj, condition, bin) %>%
summarise(outcome = mean(outcomen, na.rm = TRUE)) %>%
mutate(prime = if_else(condition == "cond1", "con", "inc"),
prime = factor(prime))
## `summarise()` has grouped output by 'subj', 'condition'. You can override using
## the `.groups` argument.
d1_out_pid
## # A tibble: 120 × 5
## # Groups: subj, condition [20]
## subj condition bin outcome prime
## <chr> <fct> <int> <dbl> <fct>
## 1 subj01 cond1 1 0 con
## 2 subj01 cond1 2 0.49 con
## 3 subj01 cond1 3 0.0490 con
## 4 subj01 cond1 4 0.175 con
## 5 subj01 cond1 5 0.0875 con
## 6 subj01 cond1 6 0.288 con
## 7 subj01 cond2 1 0.0150 inc
## 8 subj01 cond2 2 0.0305 inc
## 9 subj01 cond2 3 0.545 inc
## 10 subj01 cond2 4 0.0345 inc
## # ℹ 110 more rows
d1_out_group <- d1_out %>%
group_by(condition, bin) %>%
summarise(outcome = mean(outcomen, na.rm = TRUE)) %>%
mutate(prime = if_else(condition == "cond1", "con", "inc"),
prime = factor(prime))
## `summarise()` has grouped output by 'condition'. You can override using the
## `.groups` argument.
d1_out_group
## # A tibble: 12 × 4
## # Groups: condition [2]
## condition bin outcome prime
## <fct> <int> <dbl> <fct>
## 1 cond1 1 0.0100 con
## 2 cond1 2 0.0995 con
## 3 cond1 3 0.121 con
## 4 cond1 4 0.195 con
## 5 cond1 5 0.322 con
## 6 cond1 6 0.5 con
## 7 cond2 1 0.0210 inc
## 8 cond2 2 0.0429 inc
## 9 cond2 3 0.178 inc
## 10 cond2 4 0.108 inc
## 11 cond2 5 0.0473 inc
## 12 cond2 6 0.0963 inc
plot outcome using the prime variable to make it comparable to the
raw data
p5.3 <- ggplot(d1_out_group, aes(x = bin, y = outcome, colour = prime)) +
geom_line(aes(group = prime, linetype = prime)) +
geom_point() +
geom_jitter(data=d1_out_pid, alpha=0.5, width = 0.1, height = 0) +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
scale_colour_brewer(palette = "Dark2") +
labs(title = "simulation d1 (N=10)",
x = "time bins",
y = "outcome")
p5.3

compare the raw data to the simulated data for a single simulated
dataset
p5.4 <- (p3.1 / p5.3) +
plot_layout(guides = 'collect',
axes = 'collect')
p5.4

ggsave ("Tutorial_4_planning/figures/raw_vs_d1.jpeg",
width = 8, height = 8, dpi=800)
ok, this looks reasonably sensible.
11. Run a follow-up simulation
Let’s try a 50%, 60% and 70% reduction across N=10, 15 and 20.
let’s check effect sizes quickly.
bin6_v2 <- tibble(
effect_id = 1:3,
inc = rep(0.1, times=3),
ratio = c(0.3, 0.4, 0.5),
con = inc/ratio,
inc_clog = clogloglink(inc),
con_clog = clogloglink(con)
)
bin6_v2
## # A tibble: 3 × 6
## effect_id inc ratio con inc_clog con_clog
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.1 0.3 0.333 -2.25 -0.903
## 2 2 0.1 0.4 0.25 -2.25 -1.25
## 3 3 0.1 0.5 0.2 -2.25 -1.50
# effect_id inc ratio con inc_clog con_clog
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 0.1 0.3 0.333 -2.25 -0.903
# 2 2 0.1 0.4 0.25 -2.25 -1.25
# 3 3 0.1 0.5 0.2 -2.25 -1.50
run the sim
vary effect size and pid. Again, as before, this will take some time,
probably hours…go get another beer/coffee. And again, eval is set to
FALSE, to save time for the tutorial.
plan(multicore)
x2 <- crossing(
exp = 1:1000, # number of experiment replicates
subj_n = c(10,15,20), # range of subject N
rep_n = 200,
b6i = -2.25,
b6c = c(-0.90, -1.25, -1.50)
) %>%
mutate(d = pmap(., index_sim)) %>%
mutate(s = map(d, index_summary)) %>%
select(-d)
summarise the simulated data
sx2 <- x2 %>%
unnest(s) %>%
mutate(exp=factor(exp),
subj_n=factor(subj_n),
rep_n=factor(rep_n),
b6c = factor(b6c,
levels = c("-1.5", "-1.25", "-0.9"),
labels = c("50%", "60%", "70%")),
subj=factor(subj),
bin=factor(bin))
sx2
take a look
head(sx2)
tail(sx2)
sx2 %>%
distinct(b6c)
save out a file
## save the first simulation
write_csv(sx2, "Tutorial_4_planning/data/sim2/sim2_data.csv")
calculate summary data
read in the file, if it is already computed.
If necessary, read it from the OSF to start with (because it is a
large file). This step is only required to be done once. After that, the
file will be in your local directory and you can skip this chunk in the
future. To do so, set eval=TRUE in the chunk below, then re-set to FALSE
once the data have been downloaded.
## create a relevant folder, if it does not already exist
sim2_output_dir <- here("Tutorial_4_planning/data/sim2/")
if (!dir.exists(here(sim2_output_dir))) {
dir.create(here(sim2_output_dir))
}
## read in the data from the OSF and store it in a relevant folder, if it does not already exist
sim2_file_name <- here("Tutorial_4_planning/data/sim2/sim2_data.csv")
if (!file.exists(here(sim2_file_name))) {
osf_retrieve_node("3dbcs") %>%
osf_ls_files(pattern = "Tutorial_4_planning") %>%
osf_ls_files(pattern = "data") %>%
osf_ls_files(pattern = "sim2") %>%
osf_ls_files(pattern = "sim2_data") %>%
osf_download(path = sim2_output_dir)
}
read the file from a local folder
sx2 <- read_csv("Tutorial_4_planning/data/sim2/sim2_data.csv") %>%
mutate(exp=factor(exp),
subj_n=factor(subj_n),
rep_n=factor(rep_n),
b6c = factor(b6c,
levels = c("-1.5", "-1.25", "-0.9"),
labels = c("50%", "60%", "70%")),
subj=factor(subj),
bin=factor(bin))
## Rows: 1618687 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): subj, condition
## dbl (11): exp, subj_n, rep_n, b6i, b6c, bin, n, sum, mean_outcome, sd_outcom...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sx2
## # A tibble: 1,618,687 × 13
## exp subj_n rep_n b6i b6c subj condition bin n sum mean_outcome
## <fct> <fct> <fct> <dbl> <fct> <fct> <chr> <fct> <dbl> <dbl> <dbl>
## 1 1 10 200 -2.25 50% subj… cond1 1 200 1 0.005
## 2 1 10 200 -2.25 50% subj… cond1 2 199 6 0.0302
## 3 1 10 200 -2.25 50% subj… cond1 3 193 7 0.0363
## 4 1 10 200 -2.25 50% subj… cond1 4 186 16 0.0860
## 5 1 10 200 -2.25 50% subj… cond1 5 170 30 0.176
## 6 1 10 200 -2.25 50% subj… cond1 6 140 26 0.186
## 7 1 10 200 -2.25 50% subj… cond2 1 200 1 0.005
## 8 1 10 200 -2.25 50% subj… cond2 2 199 8 0.0402
## 9 1 10 200 -2.25 50% subj… cond2 3 191 14 0.0733
## 10 1 10 200 -2.25 50% subj… cond2 4 177 2 0.0113
## # ℹ 1,618,677 more rows
## # ℹ 2 more variables: sd_outcome <dbl>, sem <dbl>
at the exp level
sx2_exp <- sx2 %>%
group_by(exp, subj_n, rep_n, b6c, condition, bin) %>%
summarise(exp_mean_outcome = mean(mean_outcome, na.rm = TRUE),
exp_mean_sum=mean(sum, na.rm = TRUE),
exp_sum=sum(sum),
n = length(unique(subj)),
sd=sd(mean_outcome, na.rm = TRUE),
sem = (sd/sqrt(length(unique(subj)))),
.groups = "drop")
sx2_exp
## # A tibble: 108,000 × 12
## exp subj_n rep_n b6c condition bin exp_mean_outcome exp_mean_sum
## <fct> <fct> <fct> <fct> <chr> <fct> <dbl> <dbl>
## 1 1 10 200 50% cond1 1 0.014 2.8
## 2 1 10 200 50% cond1 2 0.0449 8.8
## 3 1 10 200 50% cond1 3 0.0550 10.3
## 4 1 10 200 50% cond1 4 0.200 35.4
## 5 1 10 200 50% cond1 5 0.305 43
## 6 1 10 200 50% cond1 6 0.306 28.9
## 7 1 10 200 50% cond2 1 0.013 2.6
## 8 1 10 200 50% cond2 2 0.0410 8.1
## 9 1 10 200 50% cond2 3 0.123 22.3
## 10 1 10 200 50% cond2 4 0.219 31.7
## # ℹ 107,990 more rows
## # ℹ 4 more variables: exp_sum <dbl>, n <int>, sd <dbl>, sem <dbl>
at the sim level
sx2_sim <- sx2_exp %>%
group_by(subj_n, rep_n, b6c, condition, bin) %>%
summarise(sim_mean_outcome = mean(exp_mean_outcome, na.rm = TRUE),
sim_mean_sum = mean(exp_mean_sum, na.rm = TRUE),
sim_sum = mean(exp_sum, na.rm = TRUE),
n = length(unique(exp)),
sd=sd(exp_mean_outcome, na.rm = TRUE),
sem = (sd/sqrt(length(unique(exp)))),
.groups = "drop")
sx2_sim
## # A tibble: 108 × 11
## subj_n rep_n b6c condition bin sim_mean_outcome sim_mean_sum sim_sum
## <fct> <fct> <fct> <chr> <fct> <dbl> <dbl> <dbl>
## 1 10 200 50% cond1 1 0.0162 3.25 32.5
## 2 10 200 50% cond1 2 0.0776 15.2 152.
## 3 10 200 50% cond1 3 0.108 19.4 194.
## 4 10 200 50% cond1 4 0.218 35.0 350.
## 5 10 200 50% cond1 5 0.285 36.0 360.
## 6 10 200 50% cond1 6 0.236 20.9 209.
## 7 10 200 50% cond2 1 0.0235 4.70 47.0
## 8 10 200 50% cond2 2 0.0650 12.6 126.
## 9 10 200 50% cond2 3 0.124 22.3 223.
## 10 10 200 50% cond2 4 0.186 28.8 288.
## # ℹ 98 more rows
## # ℹ 3 more variables: n <int>, sd <dbl>, sem <dbl>
a quick plot of the data
p11.1 <- ggplot(sx2_sim,
aes(x = bin, y = sim_mean_outcome,
colour = condition)) +
geom_jitter(data=sx2_exp, aes(y = exp_mean_outcome),
alpha=0.5, width = 0.1, height = 0) +
geom_line(aes(group = condition)) +
geom_point(size=2, colour = "black") +
geom_errorbar(aes(ymin = sim_mean_outcome-sem*1.96, ymax = sim_mean_outcome+sem*1.96),
width=.2, colour = "black") +
scale_colour_brewer(palette = "Dark2") +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
labs(x="time bin",
y="outcome") +
facet_grid(fct_rev(subj_n)~b6c)
p11.1

ggsave ("Tutorial_4_planning/figures/sim2/sim_violin.jpeg",
width = 10, height = 8, dpi=800)
ok, look at some values in bin 6
at the exp level
sx2_exp_check <- sx2_exp %>%
filter(bin %in% c(6)) %>%
select(exp, subj_n, b6c, condition, bin, exp_mean_outcome) %>%
pivot_wider(names_from = "condition",
values_from = "exp_mean_outcome") %>%
group_by(exp, subj_n, b6c, bin) %>%
mutate(ratio = cond2/cond1)
head(sx2_exp_check)
## # A tibble: 6 × 7
## # Groups: exp, subj_n, b6c, bin [6]
## exp subj_n b6c bin cond1 cond2 ratio
## <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl>
## 1 1 10 50% 6 0.306 0.143 0.466
## 2 1 10 60% 6 0.295 0.119 0.403
## 3 1 10 70% 6 0.399 0.104 0.261
## 4 1 15 50% 6 0.185 0.113 0.613
## 5 1 15 60% 6 0.234 0.129 0.550
## 6 1 15 70% 6 0.327 0.123 0.376
and at the sim level
sx2_sim_check <- sx2_exp_check %>%
group_by(subj_n, b6c, bin) %>%
summarise(sim_ratio = mean(ratio),
sim_sd=sd(ratio),
sim_sem=sim_sd/sqrt(1000),
.groups = "drop")
head(sx2_sim_check)
## # A tibble: 6 × 6
## subj_n b6c bin sim_ratio sim_sd sim_sem
## <fct> <fct> <fct> <dbl> <dbl> <dbl>
## 1 10 50% 6 0.517 0.142 0.00450
## 2 10 60% 6 0.429 0.118 0.00373
## 3 10 70% 6 0.328 0.0888 0.00281
## 4 15 50% 6 0.513 0.125 0.00395
## 5 15 60% 6 0.417 0.0968 0.00306
## 6 15 70% 6 0.327 0.0722 0.00228
plot ratio values
p11.2 <- ggplot(sx2_sim_check,
aes(x = b6c, y = sim_ratio)) +
geom_point(size=1.5) +
geom_errorbar(aes(ymin = sim_ratio-sim_sem*1.96, ymax = sim_ratio+sim_sem*1.96)) +
geom_hline(yintercept = c(0.3, 0.4, 0.5), colour = "red") +
scale_colour_brewer(palette = "Dark2") +
scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
labs(x="time bin",
y="outcome") +
facet_wrap(~subj_n)
p11.2

calculate power / precision
Same basic idea as in Simulation 1.
calculate difference scores
at the exp level
sx2_diff_exp <- sx2 %>%
pivot_wider(id_cols = c(exp, subj_n, rep_n, b6c, subj, bin),
names_from = "condition",
values_from = "mean_outcome") %>%
mutate(diff = cond1 - cond2) %>%
group_by(exp, subj_n, rep_n, b6c, bin) %>%
summarise(exp_mean_diff = mean(diff, na.rm = TRUE),
exp_sd = sd(diff, na.rm = TRUE),
n = n(), # n here is the total trials per condition per pid
exp_sem = (exp_sd/sqrt(n)),
exp_ci = 1.96*exp_sem,
exp_dz = exp_mean_diff/exp_sd,
.groups = "drop")
head(sx2_diff_exp)
## # A tibble: 6 × 11
## exp subj_n rep_n b6c bin exp_mean_diff exp_sd n exp_sem exp_ci
## <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <int> <dbl> <dbl>
## 1 1 10 200 50% 1 0.00100 0.0165 10 0.00521 0.0102
## 2 1 10 200 50% 2 0.00386 0.0777 10 0.0246 0.0482
## 3 1 10 200 50% 3 -0.0676 0.108 10 0.0342 0.0671
## 4 1 10 200 50% 4 -0.0189 0.258 10 0.0816 0.160
## 5 1 10 200 50% 5 0.221 0.184 10 0.0580 0.114
## 6 1 10 200 50% 6 0.163 0.176 10 0.0558 0.109
## # ℹ 1 more variable: exp_dz <dbl>
tail(sx2_diff_exp)
## # A tibble: 6 × 11
## exp subj_n rep_n b6c bin exp_mean_diff exp_sd n exp_sem exp_ci
## <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <int> <dbl> <dbl>
## 1 1000 20 200 70% 1 -0.00675 0.0369 20 0.00825 0.0162
## 2 1000 20 200 70% 2 -0.0174 0.0696 20 0.0156 0.0305
## 3 1000 20 200 70% 3 0.0574 0.0611 20 0.0137 0.0268
## 4 1000 20 200 70% 4 0.0291 0.231 20 0.0517 0.101
## 5 1000 20 200 70% 5 0.180 0.131 20 0.0294 0.0576
## 6 1000 20 200 70% 6 0.277 0.184 20 0.0413 0.0809
## # ℹ 1 more variable: exp_dz <dbl>
at the group/sim level
sx2_diff_sim <- sx2_diff_exp %>%
group_by(subj_n, rep_n, b6c, bin) %>%
summarise(mean_diff = mean(exp_mean_diff, na.rm = TRUE),
sd = sd(exp_mean_diff, na.rm = TRUE),
n=n(),
sem = (sd/sqrt((n))),
ci = 1.96*sem)
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
sx2_diff_sim
## # A tibble: 54 × 9
## # Groups: subj_n, rep_n, b6c [9]
## subj_n rep_n b6c bin mean_diff sd n sem ci
## <fct> <fct> <fct> <fct> <dbl> <dbl> <int> <dbl> <dbl>
## 1 10 200 50% 1 -0.00728 0.0125 1000 0.000394 0.000772
## 2 10 200 50% 2 0.0126 0.0408 1000 0.00129 0.00253
## 3 10 200 50% 3 -0.0157 0.0386 1000 0.00122 0.00239
## 4 10 200 50% 4 0.0320 0.0734 1000 0.00232 0.00455
## 5 10 200 50% 5 0.192 0.0572 1000 0.00181 0.00355
## 6 10 200 50% 6 0.118 0.0496 1000 0.00157 0.00308
## 7 10 200 60% 1 -0.00780 0.0139 1000 0.000439 0.000861
## 8 10 200 60% 2 0.0138 0.0391 1000 0.00124 0.00242
## 9 10 200 60% 3 -0.0147 0.0388 1000 0.00123 0.00240
## 10 10 200 60% 4 0.0335 0.0685 1000 0.00217 0.00424
## # ℹ 44 more rows
plot
violin
diff in original units
p11.3 <- ggplot(sx2_diff_exp, aes(x=bin, y = exp_mean_diff,
colour = bin, fill = bin)) +
geom_jitter(alpha = 0.5, width = 0.1) +
geom_violin(alpha = 0.7) +
geom_point(data = sx2_diff_sim,
aes(y = mean_diff), size = 3, position=pd2, colour="black") +
geom_errorbar(data = sx2_diff_sim,
aes(y = mean_diff, ymin = mean_diff-ci, ymax = mean_diff+ci),
width=.2, position=pd2, colour = "black") +
geom_hline(yintercept = 0, colour = "black", linetype = "dashed") +
scale_fill_brewer(palette = "Dark2") +
scale_colour_brewer(palette = "Dark2") +
theme(legend.position = "none") +
ylab("mean outcome") +
ggtitle("difference score (cond1-cond2)") +
facet_grid(fct_rev(subj_n)~b6c)
p11.3

ggsave("Tutorial_4_planning/figures/sim2/sim_diffs.jpeg",
width = 10, height = 8, dpi=800)
plot each exp’s difference score and associated 95% interval
create some factors and new variables
sx2_diff_exp_2 <- sx2_diff_exp %>%
group_by(exp, subj_n, rep_n, b6c, bin) %>%
mutate(lower = exp_mean_diff-exp_ci,
upper = exp_mean_diff+exp_ci,
above_zero = if_else(lower > 0, "yes", "no"),
above_zero = factor(above_zero, levels = c("no", "yes")))
sx2_diff_exp_2
## # A tibble: 54,000 × 14
## # Groups: exp, subj_n, rep_n, b6c, bin [54,000]
## exp subj_n rep_n b6c bin exp_mean_diff exp_sd n exp_sem exp_ci
## <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <int> <dbl> <dbl>
## 1 1 10 200 50% 1 0.00100 0.0165 10 0.00521 0.0102
## 2 1 10 200 50% 2 0.00386 0.0777 10 0.0246 0.0482
## 3 1 10 200 50% 3 -0.0676 0.108 10 0.0342 0.0671
## 4 1 10 200 50% 4 -0.0189 0.258 10 0.0816 0.160
## 5 1 10 200 50% 5 0.221 0.184 10 0.0580 0.114
## 6 1 10 200 50% 6 0.163 0.176 10 0.0558 0.109
## 7 1 10 200 60% 1 -0.006 0.0156 10 0.00493 0.00967
## 8 1 10 200 60% 2 -0.0135 0.0560 10 0.0177 0.0347
## 9 1 10 200 60% 3 0.0214 0.0878 10 0.0278 0.0544
## 10 1 10 200 60% 4 -0.00657 0.220 10 0.0696 0.136
## # ℹ 53,990 more rows
## # ℹ 4 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>
calculate “power” in a quick and dirty way based on 95% CI
power_x2 <- sx2_diff_exp_2 %>%
group_by(subj_n, rep_n, b6c, bin) %>%
mutate(check = ifelse(lower > 0, 1, 0)) %>%
summarise(power = mean(check, na.rm = TRUE)) %>%
filter(bin %in% c(6))
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
power_x2
## # A tibble: 9 × 5
## # Groups: subj_n, rep_n, b6c [9]
## subj_n rep_n b6c bin power
## <fct> <fct> <fct> <fct> <dbl>
## 1 10 200 50% 6 0.682
## 2 10 200 60% 6 0.869
## 3 10 200 70% 6 0.983
## 4 15 200 50% 6 0.831
## 5 15 200 60% 6 0.956
## 6 15 200 70% 6 0.997
## 7 20 200 50% 6 0.932
## 8 20 200 60% 6 0.986
## 9 20 200 70% 6 0.999
plot power
p11.4 <- ggplot(power_x2, aes(x = b6c, y=subj_n, fill = power)) +
geom_tile() +
geom_text(aes(label = sprintf("%.3f", power)), color = "white", size = 10) +
scale_fill_viridis_c(limits = c(0, 1)) +
facet_wrap(~bin) +
labs(y="subj_n", x="b6c")
p11.4

#
ggsave ("Tutorial_4_planning/figures/sim2/power.jpeg",
width = 10, height = 6, dpi=800)
join power to the df
sx2_diff_power <- sx2_diff_exp_2 %>%
filter(bin %in% c(6)) %>%
inner_join(power_x2, by = c("subj_n", "rep_n", "b6c", "bin")) %>%
mutate(power = round(power * 100, 2))
head(sx2_diff_power)
## # A tibble: 6 × 15
## # Groups: exp, subj_n, rep_n, b6c, bin [6]
## exp subj_n rep_n b6c bin exp_mean_diff exp_sd n exp_sem exp_ci
## <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <int> <dbl> <dbl>
## 1 1 10 200 50% 6 0.163 0.176 10 0.0558 0.109
## 2 1 10 200 60% 6 0.176 0.138 10 0.0435 0.0853
## 3 1 10 200 70% 6 0.295 0.221 10 0.0698 0.137
## 4 1 15 200 50% 6 0.0716 0.0938 15 0.0242 0.0475
## 5 1 15 200 60% 6 0.106 0.135 15 0.0348 0.0683
## 6 1 15 200 70% 6 0.204 0.155 15 0.0401 0.0785
## # ℹ 5 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>,
## # power <dbl>
plot
p11.5 <- sx2_diff_power %>%
ggplot(aes(x = exp, y = exp_mean_diff, ymin = lower, ymax = upper)) +
geom_pointrange(fatten = 1/2, aes(colour=above_zero)) +
geom_hline(yintercept = 0, colour = "red") +
scale_colour_manual(values=c("darkgrey","black")) +
geom_text(aes(x=700, y=-0.35, label = sprintf("%.1f%s", power, "% power")),
color = "darkgrey", size = 5) +
theme(legend.position = "none") +
labs(x = "sim # (i.e., simulation index)",
y = "hazard difference") +
scale_x_discrete(breaks = c(250,500,750,1000)) +
facet_grid(fct_rev(subj_n)~b6c)
p11.5

ggsave ("Tutorial_4_planning/figures/sim2/bin6_diffs.jpeg",
width = 10, height = 8, dpi=800)
plot power as a bar plot
p11.6 <- ggplot(power_x2, aes(x=subj_n, y=power,
colour = b6c, fill = b6c)) +
geom_col(alpha = 0.5) +
geom_hline(yintercept = 0.8, colour = "red", linetype = "dashed") +
geom_hline(yintercept = 0.9, colour = "black", linetype = "dashed") +
geom_hline(yintercept = 0.95, colour = "blue", linetype = "dashed") +
scale_colour_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Dark2") +
labs(title = "Simulation 2") +
theme(legend.position = "none") +
facet_wrap(~b6c, nrow=1)
p11.6

ggsave ("Tutorial_4_planning/figures/sim2/bin6_power_col.jpeg",
width = 8, height = 6, dpi=800)
A summary of these power calculations might be as follows:
For a 70% reduction (0.3 hazard ratio), N=10, trial=200 per
condition, would give nearly 100% power. For a 60% reduction (0.4 hazard
ratio), N=10, trial=200 per condition, would give nearly 90% power. For
a 50% reduction (0.5 hazard ratio), N=15, trial=200 per condition, would
give over 80% power.
And a conclusion / judgment / decision process might be:
Well, like almost always, it depends on your objectives. Some
considerations might be…
How much power or precision are you looking to obtain in this
particular study? Are you running multiple studies that have some form
of replication built in? What resources do you have at your disposal,
such as time, money and personnel? How easy or difficult is it to obtain
the specific type of sample?
My thoughts for studies in my lab might be something like this…
Pick 0.4 or 0.5 as a target effect size since this is much smaller
than that observed in published studies, then pick the corresponding N
value (i.e., N=10 or N=15) that takes you over the 80% power mark.
But, and this is an important “but”, do not solely
rely on one study. Run a follow-up experiment that replicates and
extends the initial result. By doing so, you avoid the Cult of the
Isolated Single Study, and it reduces the reliance on any one type of
power analysis. Instead, you are aiming for common patterns across two
or more experiments, rather than trying to make the case that a single
study has sufficient evidential value to hit some criterion mark.